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(54) Mechanism for storing information about recorded television broadcasts 



(57) Program content, recorded to a storage medi- 
um such as disk recorder, optical recorder or random 
access memory, is indexed by the. replay file system. 
The file system maintains a storage, location and pro- 
gram I.D. record for each recorded program. The file 
system further maintains other data obtained from an 
electronic program guide that may be accessed by 
downloading from the cable or satellite infrastructure or 
over the internet. The file system also may store addi- 
tional user data, such as the date and time the program 
was last viewed, together with any user-recorded index- 
es. The file system may be accessed through natural 
language input speech. The system includes a speech 
recognizer and natural language parser, coupled to a 
dialog system that engages the user in a dialog to de- 
termine what the user is interested in accessing from 
the storage medium. The natural language parser oper- 
ates with a task-based grammar that is keyed to the 
electronic program guide data and user data maintained 
by the file system. 
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Description 

Background and Summary of the Invention 

[0001] The present invention relates generally to in- 
teractive television and interactive "replay" TV More 
particularly, the invention relates to a speech-enabled 
system lor automatically creating a catalog describing 
the contents of each TV owner's "library" of stored tele- 
vision broadcasts. The user interacts with the system by 
speaking complex, natural-language requests for infor- 
mation. The speech recognizer and natural-language 
parser of the system interpret the meaning of the user's 
requests and locate those recorded items in the owner's 
library that best respond to the request. In this way, a 
user can readily access any previously recorded infor- 
mation without the need for time consuming searching. 
[0002] The system may also maintains a database of 
user-specific information, such as information about 
which recorded programs the user has accessed more 
recently (or never). This information may be used, for 
example, to suggest to the user which recorded pro- 
grams may be deleted when the capacity of the record- 
ing device is near full. 

[0003] For a more complete understanding of the in- 
vention, its objects and advantages, refer to the follow- 
ing specification and to the accompanying drawings. 

Brief Description of the Drawings 

[0004] 

Figure 1 is a system block diagram of the system 
for storing information about recorded broadcasts, 
illustrating the presently preferred file system struc- 
ture; 

Figure 2 is a block diagram depicting the compo- 
nents of the natural language parser of the present- 
ly preferred embodiment of the invention; and 
Figure 3 is a block diagram depicting the compo- 
nents of the local parser of the presently preferred 
embodiment of the invention. 

Description of the Preferred Embodiment 

[0005] The basic interactive, replay TV allows the us- 
er to specify which programs should be stored for future 
viewing. Current technology uses hard disk recorders to 
store the program content. In the future, hard disk re- 
corders may be replaced by other media, including op- 
tical media and non-volatile random access memory. 
[0006] Regardless of the type of storage media used, 
the basic problem is how to locate stored information at 
a later time. 

[0007] The presently preferred embodiment provides 
an interactive, multimodal user interface for storing and 
retrieving information. The replay file system of the pre- 
ferred embodiment captures information about each re- 
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corded program from the electronic program guide 
available via cable, satellite or internet. 
[0008] Referring to Figure 1 , a storage medium, such 
as a hard disk recorder medium, is illustrated at 10. The 
s medium may be suitably partitioned to store program 
content (i.e., recorded broadcasts) together with a file 
system content access table used to retrieve informa- 
tion at a later date. The stored program content, depict- 
ed diagrammatically at 12, may be stored on the medi- 
co urn 10 according to any suitable physical file storage 
structure. For example, the content may be stored in 
blocks of a predetermined size at specified starting lo- 
cations within the storage medium. 
[0009] The replay file system 14 used to access the 

15 stored program content may also be stored on medium 
10, or alternatively on some other storage device or 
memory. The file system structure is illustrated generally 
at 16. The structure includes a storage location record 
18 for each program recorded. The information stored 

20 in each storage location record may constitute a pointer 
or address into the medium 1 0, showing where a partic- 
ular stored program content resides. 
[0010] Associated with each storage location record 
is a collection of additional data that is extracted from 

25 the electronic program guide, as will be more fully dis- 
cussed below. This additional information may include, 
for example, a program identifier record 20, which may 
be the name of the program or other suitable label. In 
addition, other electronic program guide data may be 

30 stored in association with each program l.D. This other 
electronic program guide data is illustrated generally at 
22 and may include such additional information as the 
program category (movie, news, weather, etc.) which 
network broadcasts the program content, the date and 

35 time of the broadcast, the actors starring in the broad- 
cast, the director, and so forth. While this other electron- 
ic program guide data is not required to locate a record- 
ed program if the program l.D. is known, the additional 
data is quite useful for enhancing interactive dialog be- 

40 tween the user and the system when the program title 
or label is not known. 

[0011] In addition, the replay file system may include 
associated user data records, illustrated at 24. These 
additional user data records may include, for example, 

45 the dates and times a particular program content was 
viewed, any recorded indexes the user has added to 
identify favorite parts of the program content, and so 
forth. The dates and times a program has been previ- 
ously viewed can be used with a special feature of the 

50 system that mediates how contents of the medium may 
be selectively erased if the storage medium is nearing 
full. The user can record preferences in the user data 
record 24, indicating whether a particular recorded se- 
lection may be automatically erased after a predeter- 

ss mined time, or selectively erased only after it has been 
viewed, or never erased unless explicitly requested by 
the user. 

[001 2] An important aspect of the presently preferred 
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replay file system is that the information used to locate 
recorded program content does not need to be explicitly 
written into the file system by the user. Rather, the sys- 
tem automatically extracts the appropriate identifying in- 
formation from the electronic program guide resource 
that is available from the cable television or satellite, 
broadcast infrastructure or over the Internet. The sys- 
tem automatically extracts electronic program guide in- 
formation when the user records program content. The 
system does this through one of several mechanisms, 
depending on the particular embodiment. 
[0013] In one embodiment, the tuner 30 tunes to a 
particular channel so that~progranrf content 32 may be 
viewed by the user or stored in storage medium 10. The 
tuner may be connected to a suitable cable television 
infrastructure or satellite infrastructure, for example. 
While the tuner is accessing the program content, it also 
obtains the electronic program guide data 34 from the 
same cable or satellite infrastructure. Tuner 30 passes 
the electronic program guide information to the replay 
file system 14, where the appropriate information is ex- 
tracted and included in the file system record for the re- 
corded program. 

[0014] In an alternate embodiment, the electronic pro- 
gram guide information may be accessed from the In- 
ternet by a separate Internet access module 36. The In- 
ternet access module 36 can be coupled by cable mo- 
dem to the internet or by telephone to an internet service 
provider. The internet access module obtains relevant 
electronic program guide information pertaining to the 7 
program being viewed and stores this information into 
the replay file system. 

- [001 5] Once the stored program content and its asso- 
ciated file system information has been recorded, the 
user has a number of different options for retrieving this 
recorded information. The system employs a sophisti- 
cated speech-enabled, multimodal user interface 
whereby the user can use a combination of speech and/ 
or on-screen prompted input (through remote control 
pushbuttons or the like) to request recorded information 
for replay. 

[0016] The speech recognizer 50 receives spoken in- 
put through a suitable microphone which may be incor- 
porated into the remote control, into a hands free device 
placed on a nearby coffee table or the like, or into the 
storage device or television set. Output from the. speech 

* recognizer is supplied to a natural language parser 52. 
[0017] The natural language parser of the preferred 
embodiment is a goal-oriented parser that uses pre-de- 
fined goal-oriented grammars to identify different user 
requests. The goal-oriented grammars are structured to 
correspond with the electronic program guide informa- 
tion by which the stored program content has been in- 

* dexed. 

[0018] The system includes a dialog system 54 that 
responds to both output from the natural language pars- 
er 52 and also to on-screen prompted input. The dialog 
system has the ability to interact with the user, asking 



the user additional questions if necessary, in order to 
ascertain what stored program or programs the user is 
interested in retrieving. The dialog system is provided 
with a file system access module 56. This module ac- 

s cesses the replay file system records, to return all file 
system records that match the user*s request. 
[001 9] For example, the user could speak into the sys- 
tem, 'I would like to watch a movie." The dialog system 
would use its file system access module to ascertain 

io whether there are any movies recorded on the storage 
medium. If there are numerous movies stored on the 
system, for instance, the dialog system may prompt the 
user to narrow the request. The prompt can be supplied 
as an on-screen prompt or a synthesized speech 

15 prompt, or both. In this case, the prompt might ask the 
user what category of movie he or she is interested in 
viewing, listing the categories for which there are cur- 
rently stored programs. The user could then select the 
category, and the system would continue to prompt the 

20 user until the user selected one program for viewing. 
[0020] The dialog system 54 may also guide the user 
through other system operations, including the record- 
ing operation and other maintenance functions. The di- 
alog system may be invoked, for example, when the disk 

25 is near full, and the system can determine by checking 
the electronic program guide data that the requested 
program will not fit on the remaining portion of the disk. 
The dialog system could prompt the user to either refrain 
from recording the program or to erase one or more pre- 

36 viously recorded programs to make room. 

[0021]. Figure 2 depicts components of the natural lan- 
guage parser 52 in more detail. In particular, speech un- 
derstanding module 128 includes a local parser 160 to 
identify predetermined relevant task-related fragments. 

35 Speech understanding module 128 also includes a glo- 
bal parser 162 to extract the overall semantics of the 
speaker's request. 

[0022] The local parser 160 utilizes in the preferred 
embodiment small and multiple grammars along with 

40 several passes and a unique scoring mechanism to pro- 
vide parse hypotheses. For example, the novel local 
parser 102 recognizes according to this approach 
phrases such as dates, names of people, and movie cat- 
egories. If a speaker utters "record me a comedy in 

45 which Mel Brooks stars and is shown before January 
23rd", the local parser recognizes: "comedy" as being a 
movie category; "January 23rd" as a date; and "Mel 
Brooks" as an actor. The global parser assembles those 
items (movie category, date, etc.) together and recog- 

50 nizes that the speaker wishes to record a movie with 
certain constraints. 

[0023] Speech understanding module 128 includes 
knowledge database 163 which encodes the semantics 
of a domain (i.e., goal to be achieved). In this sense, 
55 knowledge database 1 63 is preferably a domain-specif- 
ic database as depicted by reference numeral 165 and 
is used by dialog manager 130 to determine whether a 
particular action related to achieving a predetermined 
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goal is possible. 

[0024] The preferred embodiment encodes the se- 
mantics via a frame data structure 164. The frame data 
structure 164 contains empty slots 166 which are filled 
when the semantic interpretation of global parser 162 
matches the frame. For example, a frame data structure 
(whose domain is tuner commands) includes an empty 
slot for specifying the viewer-requested channel for a 
time period. If viewer 120 has provided the channel, then 
that empty slot is filled with that information. However, 
if that particular frame needs to be filled after the viewer 
has initially provided its request, then dialog manager 
130 instructs computer response module 134 to ask 
viewer 120 to provide a desired channel. 
[0025] The frame data structure 164 preferably in- 
cludes multiple frames which each in turn have multiple 
slots. One frame may have slots directed to attributes 
of a movie, director, and type of movie. Another frame 
may have slots directed to attributes associated with the 
time in which the movie is playing, the channel, and so 
forth. 

[0026] The following reference discusses global pars- 
ers and frames: R. Kuhn and R. D. Mori, Spoken Dia- 
logues with Computers (Chapter 14: Sentence Interpre- 
tation), Academic Press, Boston (1998). 
[0027] Dialog manager 130 uses dialog history data 
file 167 to assist in filling in empty slots before asking 
the speaker for the information. Dialog history data file 
167 contains a log of the conversation which has oc- 
curred through the device of the present invention. For 
example, if a speaker utters "I'd like to watch another 
Marilyn Monroe movie, " the dialog manager 130 exam- 
ines the dialog history data file 1 67 to check what movies 
the user has already viewed or rejected in a previous 
dialog exchange. If the speaker had previously rejected 
"Some Like It Hot", then the dialog manager 1 30 fills the 
empty slot of the movie title with movies of a different 
title. If a sufficient number of slots have been filled, then 
the present invention will ask the speaker to verify and 
confirm the program selection. Thus, if any assumptions 
made by the dialog manager 130 through the use of di- 
alog history data file 167 prove to be incorrect, then the 
speaker can correct the assumption. 
[0028] The natural language parser 52 analyzes and 
extracts semantically important and meaningful topics 
from a loosely structured, natural language text which 
may have been generated as the output of an automatic 
speech recognition system (ASR) used by a dialogue or 
speech understanding system. The natural language 
parser 52 translates the natural language text input to a 
new representation by generating well-structured tags 
containing topic information and data, and associating 
each tag with the segments of the input text containing 
the tagged information. In addition, tags may be gener- 
ated in other forms such as a separate list, or as a se- 
mantic frame. 

[0029] Robustness is a feature ol the natural lan- 
guage parser 52 as the input can contain grammatically 



incorrect English sentences, due to the following rea- 
sons: the input to the recognizer is casual, dialog style, 
natural speech can contain broken sentences, partial 
phrases, and the insertion, omission, or mis -recognition 

$ of errors by the speech recognizer even when the 
speech input is considered correct. The natural lan- 
guage parser 52 deals robustly with all types of input 
and extracts as much information as possible. 
[0030] Figure 3 depicts the different components of 

io the local parser 160 of the natural language parser 24. 
The natural language parser 52 preferably utilizes gen- 
eralized parsing techniques in a multi-pass approach as 
a fixed-point computation. Each topic is described as a 
context-sensitive LR (left-right and rightmost derivation) 

is grammar, allowing ambiguities. The following are refer- 
ences related to context-sensitive LR grammars: A. Aho 
and J. D. Ullman, Principles of Compiler Design, Addi- 
son Wesley Publishing Co., Reading, Massachusetts 
(1 977); and N. Tomita, Generalized LR Parsing, Kluwer 

20 Academic Publishers, Boston, Massachusetts (1991). 
[0031] At each pass of the computation, a generalized 
parsing algorithm is used to generate preferably all pos- 
sible (both complete and partial) parse trees independ- 
ently for each targeted topic. Each pass potentially gen- 

25 erates several alternative parse-trees, each parse-tree 
representing a possibly different interpretation of a par- 
ticular topic. The multiple passes through preferably 
parallel and independent paths result in a substantial 
elimination of ambiguities and overlap among different 

30 topics. The generalized parsing algorithm is a system- 
atic way of scoring all possible parse -trees so that the 
(N) best candidates are selected utilizing the contextual 
information present in the system. 
[0032] Local parsing system 1 60 is carried out in three 

35 stages: lexical analysis 220; parallel parse-forest gen- 
eration for each topic (for example, generators 230 and 
232); and analysis and synthesis of parsed components 
as shown generally by reference numeral 234. 



[0033] A speaker utters a phrase that is recognized 
by an automatic speech recognizer 217 which gener- 
ates input sentence 218. Lexical analysis stage 220 
identifies and generates tags for the topics (which do 
not require extensive grammars) in input sentence 218 
using lexical filters 226 and 228. These include, for ex- 
ample, movie names; category of movie; producers; 
names of actors and actresses; and the like. A regular- 
expression scan of the input sentence 218 using the 
keywords involved in the mentioned exemplary tags is 
typically sufficient at this level. Also, performed at this 
stage is the tagging of words in the input sentence that 
are not part of the lexicon of particular grammar. These 
words are indicated using an X-tag so that such noise 
words are replaced with the letter "X". 



40 Lexical analysis : 
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Parallel parse-forest generation: 

[0034] The natural language parser 52 uses a high- 
. level general parsing strategy to describe and parse 
each topic separately, and generates tags and maps 
. them to the input stream. Due to the nature of unstruc- 
tured input text 218, each individual topic parser prefer- 
ably accepts as large a language as possible, ignoring 
all but important words, dealing with insertion and dele- 
tion errors. The parsing of each topic involves designing 
context-sensitive grammar rules using a meta-level 
specification language, much like the ones used in LR 
parsing. Examples of grammars include grammar A 240 
and grammar B 242. Using the present invention's ap- 
proach, topic grammars 240 and 242 are described as 
if they were an LR-type grammar, containing redundan- 
cies and without eliminating shift and reduce conflicts. 
The result of parsing an input sentence is all possible 
parses based on the grammar specifications. 
[0035] Generators 230 and 232 generate parse for- 
ests 250 and 252 for their topics. Tag-generation is done 
by synthesizing actual information found in the parse 
tree obtained during parsing. Tag generation is accom- 
plished via tag and score generators 260 and 262 which 
respectively generate tags 264 and 266. Each identified 
tag also carries information about what set of input 
words in the input sentence are covered by the tag. Sub- . 
sequently the tag replaces its cover-set. In the preferred 
embodiment, context information 267 is utilized for tag 
and score generations, such as by generators 260 and 
262. Context information 267 is utilized in .the scoring 
heuristics for adjusting weights associated with a heu- 
ristic scoring factor technique that is discussed below 
Context information 267 preferably includes word con- . 
fidence vector 268 and dialogue context weights 269. 
However, it should be understood that the parser 52 is 
not limited to using both word confidence vector 268 and 
dialogue context weights 269, but also includes using 
one to the exclusion of the other, as well as not utilizing 
context information 267. 

[0036] Automatic speech recognition process biock 
217 generates word confidence vector 268 which indi- 
cates how well the words in input sentence 218 were 
recognized. Dialog manager 130 generates dialogue 
context weights 269 by determining the state of the di- , 
alogue. For example, dialog manager 130 asks a user 
..about. a. particular-topic, such as, what viewing time is-, 
preferable. Due to this request, dialog manager 130 de- 
termines that the state of the dialogue is time-oriented. 
Dialog manager 130 provides dialogue context weights 
269 in order to inform the proper processes to more 
heavily weight the detected time-oriented words. 

Synthesis of Tag-components: 

[0037] The topic spotting parser of the previous stage 
generates a significant amount of information that needs 
to be analyzed and combined together to form the final 



output of the local parser. The parser 52 is preferably as 
■aggressive' as possible in spotting each topic resulting 
in the generation of multiple tag candidates. Additionally 
in the presence of numbers or certain key-words, such 
5 as/between", 'before', 'and', "or°, 'around', etc., and 
especially if these words have been introduced or 
dropped due to recognition errors it is possible to con- 
struct many alternative tag candidates. For example, an 
input sentence could have insertion or deletion errors. 
10 The combining phase determines which tags form a 
more meaningful interpretation of the input. The parser 
52 defines heuristics and makes a selection based on 
them using a N-Best candidate selection process. Each 
generated tag corresponds to a set of words in the input 
is word string, called the tag's cover-set. 

[0038] A heuristic is used that takes into account the 
cover-sets of the tags used to generate a score. The 
score roughly depends on the size of the cover-set, the 
sizes in the number of the words of the gaps within the 
20 covered items, and the weights assigned to the pres- 
ence of certain keywords. In the preferred embodiment, 
ASR-derived confidence vector and dialog context in- 
formation are utilized to assign priorities to the tags. For 
example applying channel-tags parsing first potentially 
2S removes channel-related numbers that are easier to 
identify uniquely from the input stream, and leaves fewer 
. .. numbers to create ambiguities with other tags. Prefera- 
bly, dialog context information is used to adjust the pri- 
orities. . 

30 

N-Best Candidates Selection 

[0039] At the end of each pass, an N-best processor 
270 selects the N-best candidates based upon the 
35 scores associated with the tags and generates the topic- 
tags, each representing the information found in the cor- 
responding parse-tree. Once topics have been discov- 
ered this way, the corresponding words in the input can 
be substituted with the tag information. This substitution 
40 transformation eliminates the corresponding words from 
the current input text. The output 280 of each pass is 
fed-back to the next pass as the new input, since the 
substitutions may help in the elimination of certain am- 
biguities among competing grammars or help generate 
, 45 .better parse-trees by filtering out overlapping symbols. 
[0040] ; Computation ceases when no additional tags 
. are generated in_the Jast. pass, the output of the final 
pass becomes the output of the local parser to global 
parser 162. Since each phase can only reduce the 
50 number of words in its input and the length of the input 
text is finite, the number of passes in the fixed-point 
computation is linearly bounded by the size of its input. 
[0041] The following scoring factors are used to rank 
the alternative parse trees based on the following at- 
55 tributes of a parse-tree: 

• Number of terminal symbols. 

• Number of non-terminal symbols. 
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2. The file system of claim 1 further comprising a nat- 
ural language dialog system that interacts with said 
file read mechanism for supplying electronic pro- 
gram guide attribute inquires to said read mecha- 

5 nism based on spoken requests of a user. 

3. The file system of claim 1 wherein said data struc- 
ture further includes at least one user data attribute 
in association with said location of a recorded item 

io of program content. 

4. The file system of claim 3 wherein said user data 
attribute stores a record of when the item of pro- 
gram content was viewed by a user. 

15 

5. The file system of claim 3 wherein said user data 
attribute stores at least one user-defined index that 
identifies a user-defined location within a recorded 
item. 

20 

6. The file system of claim 3 further comprising stor- 
age system maintenance system that selectively 
erases previously recorded items of program con- 
tent based at least in part upon said user data at- 

25 tribute. 

7. The file system of claim 1 wherein said interlace for 
receiving electronic program guide information 
comprises an internet access system capable of ac- 

30 cessing at least one internet-based provider of elec- 
tronic program guide information. 

8. The file system of claim 1 wherein said interface for 
receiving electronic program guide information 

35 comprises a tuner for retrieving electronic program 
guide information from a supplier of program con- 
tent. 

9. A system for recording program content comprising 
40 a storage medium for storing at least one item of 

program content and a file system as defined in 
claim 1. 
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• The depth of the parse-tree. 

• The size of the gaps in the terminal symbols. 

• ASR-Confidence measures associated with each 
terminal symbol. 

• Context-adjustable weights associated with each 
terminal and non-terminal symbol. 

[0042] Each path preferably corresponds to a sepa- 
rate topic that can be developed independently, operat- 
ing on a small amount of data, in a computationally in- 
expensive way The architecture of the parser 52 is flex- 
ible and modular so incorporating additional paths and 
grammars, for new topics, or changing heuristics for par- 
ticular topics is straight forward, this also allows devel- 
oping reusable components that can be shared among 
different systems easily. 

[0043] From the foregoing it will be seen that the sys- 
tem of the invention provides an interactive replay sys- 
tem with dynamically-built replay file system structure. 
Because the file system structure automatically extracts 
relevant information from the electronic program guide 
resources available via cable, satellite and/or internet, 
the system requires very little effort on the part of the 
user, while allowing a very rich interactive dialog to lo- 
cate and replay stored information. 
[0044] While the invention has been described in its 
presently preferred form, it will be understood that the 
invention is capable of modification without departing 
from the spirit of the invention as set forth in the append- 
ed claims. 



Claims 

1. A file system for organizing recorded items of pro- 
gram content, comprising: 

a memory having a data structure for storing 
the location of a recorded item of program con- 
tent in association with at least one electronic 
program guide attribute selected from the 
group consisting of: program title identifier, pro- 
gram category, broadcasting network, date of 
broadcast, time of broadcast, actors, and direc- 
tor; 

an interface for receiving electronic program 
guide attribute information about a specific item 
of program content to be stored; 
a file write mechanism that automatically stores 
said electronic program guide attribute informa- 
tion about said item of program content to be 
stored in said data structure, and 
a file read mechanism that ascertains the stor- 
age location of a recorded item of program con- 
tent based on said at least one electronic pro- 
gram guide attribute accessed from said data 
structure. 



1 0. The system of claim 9 wherein said storage medium 
45 is a disk memory. 

1 1 . The system of claim 9 wherein said storage medium 
is a tape memory. 

so 12. The system of claim 9 wherein said storage medium 
is an electronic memory. 
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